Efficient Selection and Integration of Hidden Web Database

نویسندگان

  • Xuefeng Xian
  • Pengpeng Zhao
  • Yuanfeng Yang
  • Jie Xin
  • Zhiming Cui
چکیده

An ever increasing amount of valuable information is stored in web databases, "hidden" behind search interfaces. A new application area emerge for information retrieval and integration. There may be hundreds or thousands of web databases providing data of relevance to a particular domain on the web. So a primary challenge to internet-scale hidden web database integration is to determine in which web databases to include in the integration system with the aim of making the system contain as much high-quality data as possible and the least degree of overlap. In this paper, we present an approach to iteratively select and integrate candidate web database. The core of this approach is a benefit function that evaluates how much benefit the web database brings to a given status of an integration system by integrating it. We devise a benefit function based on the volume and quality of those new data that added to integration system by integrating the web database. We show in practice how to efficiently apply our approach to select and integrate web database. Our experiments on real hidden web databases indicate that the selected and integrated result of web databases produced by our approach yields an integration system with a significant higher utilities than a wide range of other strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sources Selection Methodology for Hidden Web Data Integration

In the internet-scale hidden web data integration, The problem of sources(web databases) selection has been a primary challenge. This paper proposes a novel approach for web databases selection of internet-scale hidden web data integration. This approach is based on a benefit function that evaluates how much benefit the web database brings to a given status of integration system by integrating ...

متن کامل

Enhancing Information Retrieval by Integration Invisible Web Data Source

Most web structures are huge, intricate and users often miss the purpose of their inquest, or get uncertain results when they try to navigate through them. Internet is enormous compilation of multivariate data. Several problems prevent effective and efficient knowledge discovery for required better knowledge management techniques it is important to retrieve accurate and complete data. The hidde...

متن کامل

A Novel Term Weighing Scheme Towards Efficient Crawl of Textual Databases

The Hidden Web is the vast repository of informational databases available only through search form interfaces, accessible by therein typing a set of keywords in the search forms. Typically, a Hidden Web crawler is employed to autonomously discover and download pages from the Hidden Web. Traditional hidden web crawlers do not provide the search engines with an optimal search experience because ...

متن کامل

Query Interface Integrator For Domain Specific Hidden Web

Web is title admittance today mainly relies on search engines. A large amount of data is hidden in the databases behind the search interfaces referred to as “Hidden web”, which needs to be indexed so in order to serve user’s query. In this paper database and data mining techniques are used for query interface integration (QII). The query interface must resemble the look and feel of local interf...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCP

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2010